library(dplyr)
library(INLA)
library(inlabru)
library(sf)
library(terra)
# load some libraries to generate nice map plots
library(scico)
library(ggplot2)
library(patchwork)
library(mapview)
library(tidyterra)Practical
Geostatistics
In this practical we are going to fit a geostatisticsl model Encounter probability of Pacific Cod (Gadus macrocephalus) from a trawl survey.
In this practical we will:
Learn how to fit a spatial model in
inlabruLearn how to add spatial covariates to the model
Learn how to do predictions
Learn how to simulate from the fitted model
Libraries to load:
The data
In this practical, we will explore data on the Pacific Cod (Gadus macrocephalus) from a trawl survey in Queen Charlotte Sound. The pcod dataset is available from the sdmTMB package and contains the presence/absence records of the Pacific Cod during each surveys along with the biomass density of Pacific cod in the area swept (kg/Km\(^2\)). The qcs_grid data contain the depth values stored as \(2\times 2\) km grid for Queen Charlotte Sound.
The dataset contains presence/absence data from 2003 to 2017. In this practical we only consider year 2003.
We first load the dataset and select the year of interest
library(sdmTMB)
pcod_df = sdmTMB::pcod %>% filter(year==2003)
qcs_grid = sdmTMB::qcs_gridThen, we create ab sf object and assigne the roght coordinate reference to it:
pcod_sf = st_as_sf(pcod_df, coords = c("lon","lat"), crs = 4326)
pcod_sf = st_transform(pcod_sf,
crs = "+proj=utm +zone=9 +datum=WGS84 +no_defs +type=crs +units=km" )We convert the covariate into a raster and assign the same coordinate reference:
depth_r <- rast(qcs_grid, type = "xyz")
crs(depth_r) <- crs(pcod_sf)Finally we can plot our dataset. Note that to plot the raster we need to upload also the tidyterra library.
ggplot()+
geom_spatraster(data=depth_r$depth)+
geom_sf(data=pcod_sf,aes(color=factor(present))) +
scale_color_manual(name="Occupancy status for the Pacific Cod",
values = c("black","orange"),
labels= c("Absence","Presence"))+
scale_fill_scico(name = "Depth",
palette = "nuuk",
na.value = "transparent" ) + xlab("") + ylab("")The model
We first fit a simple model where we consider the observation as Bernoulli and where the linear predictor contains only one intercept and the GR field defined through the SPDE approach. The model is defined as:
Stage 1 Model for the response \[ y(s)|\eta(s)\sim\text{Binom}(1, p(s)) \] Stage 2 Latent field model \[ \eta(s) = \text{logit}(p(s)) = \beta_0 + \omega(s) \] with \[ \omega(s)\sim \text{ GF with range } \rho\ \text{ and maginal variance }\ \sigma^2 \]
Stage 3 Hyperparameters
The hyperparameters of the model are \(\rho\) and \(\sigma\)
NOTE In this case the linear predictor \(\eta\) consists of two components!!
The workflow
When fitting a geostatistical model we need to fullfill the following tasks:
Build the mesh
Define the SPDE representation of the spatial GF. This includes defining the priors for the range and sd of the spatial GF
Define the components of the linear predictot. This includes the spatial GF and all eventual covariates
Define the observation model using the
bru_obs()functionRun the model using the
bru()function
1. Building the mesh
The first task, when dealing with geostatistical models in inlabru is to build the mesh that covers the area of interest. For this purpose we use the function fm_messh_2d.
One way to build the mesh is to start from the locations where we have observations, these are contained in the dataset pcod_sf
mesh = fm_mesh_2d(loc = pcod_sf, # Build the mesh
cutoff = 2,
max.edge = c(7,20), # The largest allowed triangle edge length.
offset = c(5,50)) # The automatic extension distance
ggplot() + gg(mesh) + geom_sf(data= pcod_sf, aes(color = factor(present))) + xlab("") + ylab("")2. Define the SPDE representation of the spatial GF
To define the SPDE representation of the spatial GF we use the function inla.spde2.pcmatern. This takes as input the mesh we have defined and the PC-priors definition for \(\rho\) and \(\sigma\) (the range and the marginal standard deviation of the field).
PC priors Gaussian Random field are defined in (Fuglstad et al. 2018). From a practical perspective for the range \(\rho\) you need to define two paramters \(\rho_0\) and \(p_{\rho}\) such that you believe it is reasonable that \[ P(\rho<\rho_0)=p_{\rho} \] while for the margianal variance \(\sigma\) you need to define two parameters \(\sigma_0\) and \(p_{\sigma}\) such that you believe it is reasonable that \[ P(\sigma<\sigma_0)=p_{\sigma} \]
You can use the following function to compute and plot the prior distributions for the range and sd of the Matern field.
dens_prior_range = function(rho_0, p_alpha)
{
# compute the density of the PC prior for the
# range rho of the Matern field
# rho_0 and p_alpha are defined such that
# P(rho<rho_0) = p_alpha
rho = seq(0, rho_0*10, length.out =100)
alpha1_tilde = -log(p_alpha) * rho_0
dens_rho = alpha1_tilde / rho^2 * exp(-alpha1_tilde / rho)
return(data.frame(x = rho, y = dens_rho))
}
dens_prior_sd = function(sigma_0, p_sigma)
{
# compute the density of the PC prior for the
# sd sigma of the Matern field
# sigma_0 and p_sigma are defined such that
# P(sigma>sigma_0) = p_sigma
sigma = seq(0, sigma_0*10, length.out =100)
alpha2_tilde = -log(p_sigma)/sigma_0
dens_sigma = alpha2_tilde* exp(-alpha2_tilde * sigma)
return(data.frame(x = sigma, y = dens_sigma))
}Here are some alternatives for defining priors for our model
spde_model1 = inla.spde2.pcmatern(mesh,
prior.sigma = c(.1, 0.5),
prior.range = c(30, 0.5))
spde_model2 = inla.spde2.pcmatern(mesh,
prior.sigma = c(10, 0.5),
prior.range = c(1000, 0.5))
spde_model3 = inla.spde2.pcmatern(mesh,
prior.sigma = c(1, 0.5),
prior.range = c(100, 0.5))And here we plot the different priors for the range:
ggplot() + geom_line(data = dens_prior_range(30,.5), aes(x,y, color = "model1")) +
geom_line(data = dens_prior_range(1000,.5), aes(x,y, color = "model2")) +
geom_line(data = dens_prior_range(100,.5), aes(x,y, color = "model3")) and for the sd:
ggplot() + geom_line(data = dens_prior_range(1,.5), aes(x,y, color = "model1")) +
geom_line(data = dens_prior_range(10,.5), aes(x,y, color = "model2")) +
geom_line(data = dens_prior_range(.1,.5), aes(x,y, color = "model3")) 3. Define the components of the linear predictor
We have now defined a mesh and a SPDE representation of the spatial GF. We now need to define the model components:
cmp = ~ Intercept(1) + space(geometry, model = spde_model3)NOTE since the dataframe we use (pcod_sf) is an sf object the input in the space() component is the geometry of the dataset.
4. Define the observation model
Our data are Bernulli distributed so we can define the observation model as:
formula = present ~ Intercept + space
lik = bru_obs(formula = formula,
data = pcod_sf,
family = "binomial")5. Run the model
Finally we are ready to run the model
fit1 = bru(cmp,lik)Extract results
Hyperparameters
Spatial prediction
We now want to extract the estimated posterior mean and sd of spatial GF. To do this we first need to define a grid of points where we want to predict. We do this using the function fm_pixel() which creates a regular grid of points covering the mesh
pxl = fm_pixels(mesh)then compute the prediction for both the spatial GF and the linear predictor (spatial GF + intercept)
preds = predict(fit1, pxl, ~data.frame(spatial = space,
total = Intercept + space))Finally, we can plot the maps
ggplot() + geom_sf(data = preds$spatial,aes(color = mean)) + scale_color_scico() + ggtitle("Posterior mean")ggplot() + geom_sf(data = preds$spatial,aes(color = sd)) + scale_color_scico() + ggtitle("Posterior sd")Note The posterior sd is lowest at the observation points. Note how the posterior sd is inflated around the border, this is the “border effect” due to the SPDE representation.
Instead of predicting over a grid covering the whole mesh, we can limit our predictions to the points where the covariate is defined. We can do this by defining a sf object using coordinates in the object depth_r.
pxl1 = data.frame(crds(depth_r),
as.data.frame(depth_r$depth)) %>%
filter(!is.na(depth)) %>%
st_as_sf(coords = c("x","y"))Instead of computing the posterior mean and standard deviations of the estimated surface, one can also simulate possible realizations of such surface. This will give the user a better idea of the type of realized surfaces one can expect. We can do this using the function generate().
# we simulate 4 samples from the
gens = generate(fit1, pxl1, ~ (Intercept + space),
n.samples = 4)
pp = cbind(pxl1, gens)
pp %>% select(-depth) %>%
pivot_longer(-geometry) %>%
ggplot() +
geom_sf(aes(color = value)) +
facet_wrap(.~name) +
scale_color_scico(direction = -1) +
ggtitle("Sample from the fitted model")An alternative model
We now want to check if the depth covatiate has an influende on the probability of presence. We do this in two different models
- Model 1 The depth enters the model in a linear way. The linear predictor is then defined as:
\[ \eta(s) = \text{logit}(p(s)) = \beta_0 + \omega(s) + \beta_1\ \text{depth}(s) \]
- Model 1 The depth enters the model in a non linear way. The linear predictor is then defined as:
\[ \eta(s) = \text{logit}(p(s)) = \beta_0 + \omega(s) + f(\text{depth}(s)) \] where \(f(.)\) is a smooth function. We will use a RW2 model for this.
We now want to fit Model 2 where we allow the effect of depth to be non-linear. To use the RW2 model we need to group the values of depth into distinct classe. To do this we use the function inla.group() which, by default, creates 20 groups. The we can fit the model as usual
# create the grouped variable
depth_r$depth_group = inla.group(values(depth_r$depth_scaled))
# run the model
cmp = ~ Intercept(1) + space(geometry, model = spde_model3) +
covariate(depth_r$depth_group, model = "rw2")
formula = present ~ Intercept + space + covariate
lik = bru_obs(formula = formula,
data = pcod_sf,
family = "binomial")
fit3 = bru(cmp, lik)
# plot the estimated effect of depth
fit3$summary.random$covariate %>% ggplot() + geom_line(aes(ID,mean)) +
geom_ribbon(aes(ID, ymin = `0.025quant`,
ymax = `0.975quant`), alpha = 0.5)